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Abstract 


A function approximation method is developed that aims to approximate a 
function in a small neighborhood of a state that travels within a compact 
set. The development is based on the theory of universal reproducing kernel 
Hilbert spaces over the n-dimensional Euclidean space. Several theorems 
are introduced that support the development of this State Eollowing (StaE) 
method. In particular, it is shown that there is a bound on the number 
of kernel functions required for the maintenance of an accurate function 
approximation as a state moves through a compact set. Additionally, a 
weight update law, based on gradient descent, is introduced where arbitrarily 
close accuracy can be achieved provided the weight update law is iterated 


at a sufficient frequency, as detailed in Theorem 6.1 


To illustrate the advantage, the impact of the StaF method is that for 
some applications the number of basis functions can be reduced. The StaF 
method is applied to an adaptive dynamic programming (ADR) application 
to demonstrate that stability is maintained with a reduced number of basis 
functions. 

Simulation results demonstrate the utility of the StaF methodology for 
the maintenance of accurate function approximation as well as solving an 
infinite horizon optimal regulation problem through ADP. The results of 
the simulation indicate that fewer basis functions are required to guaran¬ 
tee stability and approximate optimality than are required when a global 
approximation approach is used. 
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1. Introduction 


Often in the theory of approximation, an accurate estimation of a func¬ 
tion over a large compact set is sought mum- It is well known that the 
larger the compact set, a correspondingly larger number of basis functions 
are required to achieve an accurate function approximation. There is a large 
body of literature concerned with methods for the reduction of the number 
of basis functions required to achieve such an approximation (C.f. [51E1E]). 

In many control applications, function approximation is used to generate 
a stabilizing controller of a state in a dynamical system. For instance, in 
adaptive dynamic programming (ADP), an approximation of the optimal 
value function is leveraged to produce an approximate optimal controller 
13 m Uni im im uni [m [ini im HT] . Traditionally, the approximation is 
sought over a large compact set, and requires many basis functions. The 
computational resources required to tune the weights of the basis functions 
renders real-time implementation of controllers based on ADP methods in¬ 
feasible. 

Motivated by problems in control theory, this paper introduces an ap¬ 
proximation methodology that aims to establish and maintain an accurate 
approximation of a function in a neighborhood of a moving state in a dy¬ 
namical system. The method, deemed the state following (StaF) method, 
reduces the number of basis functions required to achieve an accurate ap¬ 
proximation by focusing on the approximation of a function over a small 
neighborhood by linear combinations of time and state varying basis func¬ 
tions. Therefore, even in cases where processing power of on-board CPUs is 
limited, an accurate approximation of a function can be maintained. 

The particular basis functions that will be employed throughout this 
paper are derived from kernel functions corresponding to RKHSs. In partic¬ 
ular, the centers are selected to be continuous functions of the state variable 
bounded by a predetermined value. That is, given a compact set D C M”, 
e > 0, r > 0 and M G N, Ci(x) = x -|- di{x) where dj : M"’ —> M” is 
continuously differentiable and sup^-g^; ||di(x)|| < r for f = I,...,M. The 
parameterization of a function V : D —>■ M in terms of StaF kernel functions 
is given by 

M 

V{y-,x{t),t) = '^Wi{t)K{y,Ci{x{t))) 

i=l 

where Wi{t) is a weight signal chosen to satisfy 

limsupt) < e 

t—^OO 
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where Ej. is a measure of the accuracy of an approximation in a neighborhood 
of x{t), such as that of the supremum norm: 


Er{x{t),t) 


sup 

y£Nr{x{t)) 


v{y) -V{y,x{t),t) 


The goal of the StaF method is to establish and maintain an approxi¬ 
mation of a function in a neighborhood of the state. The justification for 
this approach stems from the observation that an optimal controller only 
requires the value of the estimation of the optimal value function to be ac¬ 
curate at the current system state. Thus, when computational resources are 
limited, computational efforts should be focused on improving the accuracy 
of approximations near the system state. 

The advantage of using RKHSs for the purpose of local approximations 
is twofold. RKHSs have been found to be effective for nonlinear function 
approximation |18j . and the use of RKHS can enable accurate estimations 
of a wide array of nonlinear functions. Also, the ideal weights correspond¬ 
ing to the Hilbert space norm provided by RKHSs change smoothly with 


respect to smooth changes in the centers, as demonstrated in Theorem 5.1 


which allows the execution of weight update laws to achieve and maintain 
an accurate approximation. The ideal weights in the context of the StaF 
approximation method become a continuous function of the state and are 
investigated in Section 

Previous efforts in the literature have performed nonlinear approxima¬ 
tion through the adjustment of the centers of radial basis functions (c.f. 
Ha EOl ED) as a means to determine the optimal centers for global approx¬ 
imation. These efforts are more applicable when off-line techniques can be 
used due to computational demands. For other applications where compu¬ 
tational resources are limited, global approximations may not be feasible 
(especially as the dimension of the problem grows), nor is the optimal selec¬ 
tion of parameters. 

This paper lays the foundation for the establishment and maintenance 
of a real-time moving local approximation of a continuous function. Section 
of this paper frames the particular approximation problem of the StaF 
method. Section demonstrates accurate approximation with a fixed num¬ 
ber of moving basis functions. Section demonstrates an explicit bound 
on the number of required StaF basis functions for the case of the expo¬ 
nential kernel function. The ideal weight function arising from the StaF 
method is introduced and discussed in Section where the existence and 
smoothness of the ideal weight function is established. Section provides 
a proof of concept demonstrating the existence of weight update laws to 
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maintain an accurate approximation of a function in a local neighborhood, 
ultimately establishing a uniform ultimate bounded result. The remaining 
sections demonstrate the developed method through numerical experiments 
and discussions of applications. Specifically, Section gives the results of a 
“gradient chase” algorithm. In Section the utility of StaF methods are 
demonstrated in an ADP application. 


2. The StaF Problem Statement 


Given a continuous function V : M" —)> M, r > 0, an arbitrarily small 
e > 0, and a dynamical system x = f{x, u) (where / is sufficiently regular for 
the system to be well defined), the goal of the StaF approximation method 
is to select state and time varying basis functions ai : M” x M"' x M —M for 
i = 1,2, ..., M and weight signals Wi : M+ —>■ M for i = 1, 2,..., M such that 


M 

limsup sup V{y) - y^^Wi{t)ai{y,x{t),t) 

t^OO y^Mr{x(t)) i=l 


< e. 


( 1 ) 


In other words, the StaF approximation method aims to achieve an arbi¬ 
trarily small steady state error of order e in a closed neighborhood of the 
state, Nr{x{t)) = {y G M” : \\x{t) — y \\2 < r}. 

Central problems to the StaF method include determining the basis func¬ 
tions and the weight signals. When reproducing kernel Hilbert spaces are 
used for basis functions, 0 can be relaxed to where the supremum norm 
is replaced with the Hilbert space norm. Since the Hilbert space norm of 
a RKHS dominates the supremum norm, 0 with the supremum norm is 
simultaneously satisfied. Moreover, when using a RKHS, the basis functions 
can be selected to correspond to centers placed in a moving neighborhood of 
the state. In particular, given a kernel function K : M" xM"^ —)• M correspond¬ 
ing to a (universal) RKHS, H, and continuous center functions Ci : —)• M” 
for which di{x) := Ci{x)—x is bounded by r, then the StaF problem becomes 
the determination of weight signals Wi : M-|_ —>■ M for i = 1, ...,M such that: 


lim sup 

t^OO 


M 

Vi-) - 

i=l 


r,x(t) 


< e 


( 2 ) 


where || • is the norm of the RKHS obtained by restricting functions 

in H to Nrlx{t)). 

Since (§ implies Q, the focus of this paper is to demonstrate the fea¬ 
sibility of satisfying ([2|. Theorem 3.1 demonstrates that under a certain 
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continuity assumption a bound on the number of kernel functions necessary 
for the maintenance of an approximation throughout a compact set can be 
determined, and Theorem 5.1 shows that a collection of continuous ideal 
weight functions can be determined to satisfy (§. Theorem |5.1| justifies 
the use of weight update laws for the maintenance of an accurate function 
approximation, and this is demonstrated by Theorem 6.1 as well as the 
numerical results contained in Section and Appendix A 


The choice of RKHS for Section is that which corresponds to the ex¬ 
ponential kernel K{x,y) = exp(x^y) where x,y G M”" and will be denoted 
by since it is closely connected to the Bargmann-Fock space |23j . 

The RKHS corresponding to the exponential kernel is a universal RKHS 
121 [25], which means that given any compact set D C M”', e > 0 and con¬ 
tinuous function f : D ^ M., there exists a function / G for which 

sup^eD \fix) - fix)\ < €. 


3. Feasibility of the StaF Approximation and the Ideal Weight 
Functions 

The first theorem concerning the StaF method demonstrates that if the 
state variable is constrained to a compact subset of M"', then there is a 
finite number of StaF basis functions required to establish the accuracy of 
an approximation. 


Theorem 3.1. Suppose that K : XxX C is a continuous kernel function 
corresponding to a RKHS, H, over a set X equipped with a metric topology. 
If V ^ H, D is a compact set of X, r > 0, and ||K||a;,r is continuous with 
respect to x, then for all e > 0 there is a M G N such that for each x G D 
there are centers ci, C2,..., cm G Nr{x) and weights Wi G C such that 


M 

i=l 


< e. 


Proof. Given e > 0, for each neighborhood Nr{x) with x G D, there exists 
a finite number of centers ci,...,cm £ Xr{x), and weights wi,...,wm £ C, 
such that 


M 


< e. 


*— r,x 

Let Mx^e be the minimum such number. The claim of the proposition is that 
the set Qe ■= {Mx^e : x G D} is bounded. Assume by way of contradiction 
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that Qe is unbounded, and take a sequence {xn} C D such that is a 

strictly increasing sequence and Xn ^ x in D. It is always possible to find 
such a convergent sequence, since every compact subset of metric space is 
sequentially compact. Let ci, ^ ^rix) and wi, ^^2 £ C be 

centers and weights for which 


E{x) 


^x,e/2 

V{-)- ^ WiKi;Ci) 
i=l 


< e/ 2 . 

r^x 


( 3 ) 


For convenience, let each Cj G Nr{x) be expressed as x + di for di G Nr{0). 
The function E{x) in Q can be written as 


fM. 




M. 


a:,e/2 


1/2 


2Re WiV{x + di) + WiWjK{x + di,x + dj 

*j=i 


2 = 1 


By the hypothesis, K is continuous with respect to x, which implies that V 
is continuous [2], and ||F||r,a; is continuous with respect to x. Hence, there 
exists r/ > 0 for which \E{x) — E{xn)\ < e/2 for all Xn G Nr^{x). Thus 
E{xn) < E{x) + e/2 < e for sufficiently large n. By minimality < 

M 3 , g /2 for sufficiently large re. This is a contradiction. □ 


The assumption of the continuity of 


r^x 


in Theorem 3.1 is well founded. 


There are several examples where the assumption is known to hold. For 
instance, if the RKHS is a space of real entire functions, as it is for the 
exponential kernel, then ||F||r,x is not only continuous, but it is constant. 


Using a similar argument as that in Theorem 3.1 the theorem can be 
shown to hold when the restricted Hilbert space norm is replace by the 
supremum norm over Nr{x). The proof of the following theorem can be 
found in the preliminary work for this article in [T]. 


Proposition 3.2. Let D be a compact subset of M"’, V : M” —)> M 6 e o 
eontinuous function, and K : M"" —>■ M"" —)> M 6 e a continuous and universal 
kernel function. For all e, r > 0, there exists M G N such that for each x G 
D, there is a collection of centers ci, Cm £ Nr{x) and weights wi, ...,wm £ 


such that V{y) - K{y, a) 


< e. 


4. Explicit Bound for the Exponential Kernel 

Theorem IQ establishes a bound on the number of kernel functions re¬ 
quired for the maintenance of the accuracy of a moving local approximation. 
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However, the proof does not provide an algorithm to computationally deter¬ 
mine the upper bound. Even when the approximation with kernel functions 
is performed over a fixed compact set, a general bound for the number of 
collocation nodes required for accurate function approximation under the 
Hilbert space norm is unknown. Thus, it is desirable to have a computa¬ 
tionally determinable upper bound to the number of StaF basis functions 
required to yield an arbitrarily close approximation. Theorem |4 .1 1 provides a 
calculable bound on the number of exponential functions required to yield an 
arbitrarily close approximation with respect to the supremum norm. That 
is, Theorem |4 .1 1 provides a computable analogue of Theorem 3.1 and Propo¬ 
sition |3.2| for a StaF approximation problem of the form 


lim sup sup 

y&Nr{x{t)) 


y{y) -'^'Wi{t)K{y,Ci{x{t))) 


i=l 


< e. 


While error bounds have been computed for the exponential function 
with respect to the supremum norm (c.f. [26|1. current literature allows the 
“frequencies” or centers of the exponential kernel functions to be uncon¬ 
strained. The lack of constraints on the centers of the exponential kernel 
functions means that the existing results cannot be leveraged for the StaF 


approximation problem. The contribution of Theorem 4.1 is the develop¬ 
ment of an error bound while constraining the size of the centers. 


Theorem 4.1. Let K : M"' x M” —>■ M given by K{x,y) = exp {x^y) he the 
exponential kernel function. Let D C be a compact set, H : H — )■ M 
continuous, and e, r > 0. For each x G D, there exists a finite number of 
centers ci, ...,cm„, ,, G Nr{x) and weights wi,W 2 , ^ £ 1^? such that 


sup 

y&Nr{x) 


y{y)-^ WiK{y,Ci) 
i=l 


< e. 


If p is an approximating polynomial that achieves the same accuracy over 
Nr{x) with degree Nx,e, then an asymptotically similar bound can be found 
with kernel functions, where Mx^e < for some constant 

Sx,e that is the degree of an approximating polynomial. Moreover, Nx,e ond 
Sx,e can he bounded uniformly over D, and thus, so can Mx^e- 


Proof. For notational simplicity, the quantity ||/||d,oo denotes the supre¬ 
mum norm of a function f : D over the compact set D throughout the 


proof of Theorem 4.1 
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First, consider the ball of radius r centered at the origin. The statement 
of the theorem can be proved by finding an approximation of monomials by 
a linear combination of exponential kernel functions. 

Let a = (oi, 02 ,..., On) be a multi-index, and define |a| = Note 

thalE] ^ 

^ (exp {yi/m) - 1 )“‘ = ■■■Vn" +0 f 

i=l ^ ' 

which by the binomial theorem leads to the sum 



The big-oh constant indicated by 0(l/m) can be computed in terms 
of the derivatives of the exponential function via Taylor’s Theorem. The 
centers corresponding to this approximation are of the form li/m where li 
is a non-negative integer satisfying k < ai. Hence, for m sufficiently large, 
the centers reside in Nj.{0). 

To shift the centers so that they reside in Nr{y), let x = {xi,X 2 , ...,XnY' G 
M"', and multiply both sides of Q by exp [y'^x) to get 



For each multi-index, a = (ai,a 2 , ...,an), the centers for the approxima¬ 
tion of the corresponding monomial are of the form Xi + li/m for 0 < li < Oi. 
Thus, by linear combinations of these kernel functions, a function of the form 
^g{y), where 5 ^ is a multivariate polynomial, can be uniformly approxi¬ 
mated by exponential functions over Nj.{x). Moreover if is a polynomial 
of degree /?, then this approximation can be a linear combination of 
kernel functions. 


^The notation gm{x) = 0{f{m)) means that for sufficiently large m, there is a constant 
C for which gm{x) < Cf{m) for all y £ M(0). 
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Two polynomials, px and Qx are selected to approximate V and e~^ 
respectively, over Nr{x). Since 1^ is a continuous function, it can be approx¬ 
imated with arbitrary accuracy by polynomials. Subsequently, the previous 
development will be utilized to approximate the polynomials by linear com¬ 
binations of exponential functions. 

Let e' > 0 and suppose that px is polynomial with degree Nx^e' such that 


Px{y) = V{y) + ei{y) 

where |ei(y)| < for all y G Nr{x). Let qx{y) be a polynomial 

in M" variables of degree Sx^e such that 

qx{y) = + e 2 {y) 

where 62 ( 2 /) < \\V\\d]oc\\^^'^''\\d]oc^'V ^ Nr{x). 

The above construction indicates that there is a sequence of linear com¬ 
binations of exponential kernel functions, Fm{y), (with a fixed number of 
centers inside Nr{x)) for which 

Fmiy) = e^^'"qx{y)px{y) + O f) 

\m J 

= {e-y"^ + e 2 {y)) {V{y) + ei{y)) + O • 


After multiplication and an application of the triangle inequality, the fol¬ 
lowing is established: 


\F^{y)-V{y)\<e' + 



,'2 


+ o( - 

m 


for all y E Nr{x). The degree of the polynomial qx-> Sx^e-> can be uniformly 

T 

bounded in terms of the modulus of continuity of ^ over D. Similarly, 
the uniform bound on the degree of px, A^x,e') can be described in terms of 
the modulus of continuity of V over D. The number of centers required for 
Fmiy) is determined by the degree of the polynomial q -p (treating the x 
terms of q as constant), which is sum of the two polynomial degrees. Finally 
for m large enough and e' small enough, \Fmiy) — Viy)\ < e, and the proof 
is complete. □ 


Theorem 4.1 demonstrates an upper bound required for the accurate 
approximation of a function through the estimation of approximating poly¬ 
nomials. Moreover, the upper bound is a function of the polynomial degrees. 
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For example, for a neighborhood of the origin in M, if p is an approximat¬ 
ing polynomial of degree N, then the same order of approximation can be 
achieved by a linear combination of -|- 1 exponential functions. The ex¬ 


E 


ponential kernel will be used for simulations in Section ^ and Appendix 


5. Existence and Smoothness of the Ideal Weight Function 


Theorem 3.1 and Proposition |3 . 2| establish that given a kernel function, a 
finite number of centers can be used to yield an arbitrarily accurate estima¬ 
tion of a function, for a set of ideal weights. Theorem |4.1| further establishes 
that for the exponential kernel function, a calculable number of centers can 
be determined. However, further investigation is required to understand 
the characteristics of the ideal weights that correspond to the moving cen¬ 
ters. For example, in control applications involving function approximation 
or system identification, it is assumed that there is a collection of constant 
ideal weights, and much of the theory is in the demonstration of the conver¬ 
gence of approximate weights to the ideal weights. The subsequent Theorem 


5.1 establishes that ideal weights, which are functions of the state dependent 
centers, are m-times continuously differentiable. This property can then be 
used to develop weight update laws (e.g., see Section]^. 

Since the ideal weights corresponding to a Hilbert space norm are unique, 
framed in the Hilbert space setting of Q. Thus, Theorem 


Theorem 5.1 


IS 


5.1 together with Theorem 3.1 provides the StaF framework for RKHSs. 


Theorem 5.1. Let H be a RKHS over a set X C M” with a strictly positive 
kernel K : X x X ^ C such that K{-,c) G for all c G X. Suppose 

that V G H. Let C he an ordered collection of M distinct centers, C = 
(ci,C2, ...,cm) G X^, with the associated ideal weights 


W{C) 


arg min 


M 

J2aiK{;Ci)-V{-) 


2=1 


H 


(5) 


The function W is mo-times continuously differentiable with respect to each 
component of C. 


Proof. The determination of W{C) is equivalent to computing the projec¬ 
tion of V onto the space Y = span{A'(-, Cj) : i = 1,...,M}. To compute 
the projection, a Gram-Schmidt algorithm can be employed. The Gram- 
Schmidt algorithm is most easily expressed in its determinant form. Let 
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D. 


0 = 1 and Dm = det {K{cj, then for m = 1, ...,M the functions 



/ K{ci,ci) 

K{ci,C2) 

K{ci,Cm) 

\ 


K{c2,Ci) 

K{c2,C2) 

K{c 2 ,Cm) 


V 

1? Cl) 

K{Cm-l,C2) 

K Cm 

) 


\ K{x,ci) 

K{x,C2) 

K{x,Cm) 

/ 

constitute an orthonormal basis for Y . Since K is strictly positive dehnite 

Dm is positive for each m 

and every C. 

The coefficient for each K{x 

u) 


with I = 1,m in Um is a sum of products of the terms K{ci, Cj) for i,j = 
1, ...m. Each such coefficient is mo-times differentiable with respect to each 
Cj, * = 1, M. When {V, Um) is computed for the projection, the result is a 
linear combination of evaluations of V at each of the centers. The function 
V is mo-times continuously differentiable, since K is mo-times differentiable 
[25], therefore {V,Um) is continuous with respect to the centers. Finally, 
each term in W (C) is a linear combination of the coefficients determined by 
Um for m = 1,...,M, and thus is mo-times continuously differentiable with 
respect to each Ci for i = 1,..., M. □ 


6. The Gradient Chase Theorem 

As mentioned before, control theory problems involving function approx¬ 
imation and system identification are centered around the concept of weight 
update laws. Weight update laws are a collection of rules that the approx¬ 
imating weights must obey which lead to convergence to the ideal weights. 
In the case of the StaF approximation framework, the ideal weights are re¬ 


placed with ideal weight functions. Theorem 5.1 showed that if the moving 
centers of the StaF kernel functions are selected in such a way that the 
centers adjust smoothly with respect to the state x, then the ideal weight 
functions will also change smoothly with respect to x. Thus, in this context, 
weight update laws of the StaF approximation framework aim to achieve an 
estimation of the ideal weight function at the current state. 

Theorem |6.1| provides an example of such weight update laws that achieve 
a UUB result. The theorem takes advantage of perfect samples of a function 
in the RKHS H corresponding to a real valued kernel function. 

The proof of the theorem is similar to the standard proof for the con¬ 
vergence of the gradient descent algorithm for a quadratic programming 
problem m- The contribution of the proof is in a modification, where the 
mean value theorem is used to produce an extra term which results in a UUB 
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result, and the continuity of the largest and smallest eigenvalues of a Gram 
matrix are used to get a uniform bound in tandem with the Kantorovich 
inequality. 


Theorem 6.1 (Gradient Chase Theorem). Let H be a real valued RKHS 
over ML with a continuously differentiable strictly positive definite kernel 
funetion K : M” x M"' —)■ M. Let V € H, D G M"' be a compact set, and 
X : M ^ M^ a state variable subject to the dynamieal system x = q{x,t), 
where q : M” x M_|_ —>■ M” is a bounded locally Lipschitz continuous function. 
Further suppose that x{t) G D for all t > 0. Let c : M”" —)> M^, where for 
each i = cfix) = x + di{x) where di G and let a G M^. 

Consider the function 


F{a,c) 


M 

V - 'y]aiK(-,Ci(x)) 

i=l 


2 


H 


At each time instance t > 0, there is a unique W{t) for which W{t) = 
argmin^gjgM K(a, c(x(t))). Given any e > 0 and initial value a^, there is a 
frequency r > 0, where if the gradient descent algorithm (with respect to a) 
is iterated at time steps At < , then F{afi, c^) — F{w^, c^) will approach 

a neighborhood of radius e as k ^ oo. 


Proof. Let e > 0. By the Hilbert space structure of H: 

F{a, c) = \\VfH - 2V{cfa + K{c)a 

where V{c) = {V{ci), ...,V {cm))'^ and K{c) = {K{ci,Cj))fj^i is the sym¬ 
metric strictly positive kernel matrix corresponding to c. At each time 
iteration ff, k = 0,1,2,..., the corresponding centers and weights can be 
written as G and G M^, respectively. The ideal weights cor¬ 

responding to will be denoted by w^. It can be shown that = 


ensures that the ideal weights change continuously with respect to the cen¬ 
ters which remain in a compact set , where D = {x ^ : ||x — D\\ < 

maxj=i^,..^M (sup,j,g£) |dj(x)|)}, so the collection of ideal weights is bounded. 
Let P > e be large enough so that Nji{0) contains both the initial value aP 
and the set of ideal weights. To facilitate the subsequent analysis, consider 


K{c^)-^V{c^) and F{w^,c>‘) = \\V\\j^ - V{c'^Y K{c^)V{c^). Theorem 


5.1 


12 







the constants; 


Rq 


R2 


R 4 


max \q(x,t)\ 
x&D,t>0 

max I Vc-F(i(;(c), c)| 

c&D 


max 

c&D 



Ri = max |VaF(a, c)| 

a&Nr{0),c&D 

i ?3 = max \ di{x{t)\ 
c£D 


where Va is the gradient with respect to a, and let At < := e • {2{Rq + 

R 3 ) • (-^1 • R 4 • {Rq + R 3 ) + R 2 + 1))”^- The proof aims to show that by 
using the gradient descent law for choosing a^, the following inequality can 
be achieved: 

F{a^, c^) — F{w^, c^) ^ F{a^, c^) — F(w^, c^) 

for some 0 < (5 < 1. Set 

ak+i (6) 

where g = —VaF{a^,c^) = 2V{c^) — 2K{c^)a^ and A is selected so that the 
quantity F{a^ + Xg, c^) is minimized. The A that minimizes this quantity 

^ F(a^+\c*^) = F{a^,d^) - Since 

F is continuously differentiable in the second variable, we have 
F(a*^+\c^+i) = F(a*^+i,c^) + VcF(a^+i,77) • (c^+i - c^). Since |c(x(t))| < 
Rq+Rz, an application of the mean value theorem demonstrates that — 
c^ll < {Rq + R^)IS.t. Thus 

F(a^+\ = F(a^+\ c^) + ei(t'=), 


where |ei(t^)| < e/2 for all k. The quantity F is continuously 
differentiable in both variables. Thus, by the multi-variable chain rule and 
another application of the mean value theorem: 

F{w^+\c'^+^) = F{w\ c^) + e 2 {t^), 


for |e 2 (t^)| < e/2 for all k. Therefore, the following is established: 


F{a^+\c^+^) - F{w^+\ 
F{a^, c^) — F{w^, 

= 1 - 


c^+^) _ F(a^+\ c^) - F(^;^ c^) + (ei(f^) - eaft^)) 
) F{a^,c^) — F{w^,c^) 

_ jg'^g)^ _ - e2{t^) 

{g'^K{c^)g){g'^K{c^)~^g) F{a^,c^) — F{w^,c^) 
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The Kantorovich inequality m yields 

__ / A^k/a^k -l y 

{g'^K{d^)g){g^K{c^)-^g) “ \A^k/a^k + l) 

where A^k is the largest eigenvalue of Kr(c^) and is the smallest eigenvalue 
of K{c^). The quantity on the right of 0 is continuous with respect to A^k 
and a^k. In turn, A^k and a^k are continuous with respect to K{c^) (c.f. 
Exercise 4.1.6 [28]) which is continuous with respect to c^. Therefore there 
is a largest value, 6 , that the right hand side of Q obtains on the compact 
set D and this value is less than 1. Moreover, S is independent of e, so it 
may be declared that e = e(l — S). Finally, 

F(a^, c^) — F{w^, c^) ~ F{a^, c^) — F{w^, c^) 

Therefore, setting e{k) = F{a^, c^) — F{w^, c^), it can be shown that e{k + 
1) < 5e{k) + e(l — 5) and the conclusion of the theorem follows. □ 

7. Simulation for the Gradient Chase Theorem 

To demonstrate the effectiveness of the Gradient Chase theorem, a sim¬ 
ulation performed on a two-dimensional linear system is presented below. 
The system dynamics are given by 


xiA / 0 1\ /xA 

X 2 ) V-1 o/A2y’ 

which is the dynamical system corresponding to a circular trajectory. The 
function to be approximated is 

V(xi, X2) = xf + 5x2 + tanh(xi • X 2 ), 


and the kernel function to be used for function approximation are the ex¬ 
ponential kernels, K{x,y) = exp(x^y). The centers are arranged in an 
equilateral triangle centered about the state. In particular, each center re¬ 
sides on a circle of radius 0.1 centered at the state: 


Ci(x) 


x + o.ifAr-mA) 

ycos((^ — l)27r/3) J 


for i = 1,2, 3. 
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The initial values selected for the weights are = [0 0 0]^. The gradient 
descent weight update law, given by ([^, are applied 10 iterations per time- 
step and the time-steps incremented every 0.01 seconds. Figure presents 
the results of the simulation. 

Figure |ld| demonstrates that the function approximation error is regu¬ 
lated to a small neighborhood of zero as the Gradient Chase Theorem is 
implemented and validates the claim of the UUB result of Theorem 6.1 In 


Figure [T^ approximations of the ideal weight function can be seen to be pe¬ 
riodic as well as smooth. The smoothness of the ideal weight function itself 
is given in Theorem 5.1, and the periodicity of the approximation follows 


from the periodicity of the selected dynamical system, as illustrated in Fig¬ 
ure!^ Figure presents a comparison of V evaluated at the current state 
to that of the approximation evaluated at the current state. Approximation 
of the function is maintained as the system state moves through its domain 
as anticipated. 


8. Application to Adaptive Dynamic Programming 

The application of approximation theory to the theory of optimal control 
arises through the approximation of the optimal value function, which is the 
solution to the Hamilton-Jacobi-Bellman (HJB) equation. Efficient methods 
for the approximation of the optimal value function are essential, since an 
increase in dimension can lead to a exponential increase in the number of 
required basis functions necessary to achieve an accurate approximation, the 
so called “curse of dimensionality”. 

The optimal value function corresponds to the infinite horizon optimal 
regulator problem, where the cost function 

poo 

J{x,u)= / x^Qx + v^Rudt 

Jo 

is to be minimized subject to the dynamics 

x{t) = f{x{t)) + g{x{t))u{t) (8) 

where x : M+ —)> M"', u : M+ —>■ M'”, Q G R G with Q and 

R positive definite, / : M"" —M”, g : M” —)■ Moreover, / and g are 

assumed to be locally Lipschitz. The optimal value function is given by 

poo 

V(x)= inf / x^Qx + u^Rudt 
new 7o 
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Phase Portrait of the Dynamical System 



Xl 


(a) Trajectory of the state vector. 


Actual and Estimated Function 



(b) Comparison of V and the approximation 

V. 


Weight Estimates 



(c) The values of the weight function esti¬ 
mates. 


Function Estimation Error 



Time (s) 

Error committed by the approximation at 
current state. 


Figure 1: Results of the numerical experiment demonstrating the Gradient 
Chase algorithm. 
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where lA is the collection of admissible controllers. When the optimal value 
function is continuously differentiable and an optimal controller, u* € U 
exists, the optimal value function is the unique solution to the HJB equation 

0 = x^Qx + u*^Ru* + VV{x){f{x) + g{x)u*). (9) 

Once the optimal value function is determined, the optimal controller 
takes the form 

u*{x{t)) = -^R~^g{xfW{x{t)f. (10) 

In many applications, an approximation of the optimal controller is used 
real-time to yield autonomous behavior in a dynamic environment. 

For some problems, such as the linear quadratic regulator (LQR) prob¬ 
lem, the optimal value function takes a particular form which simplifies the 
choice of basis functions. In the case of LQR, the optimal value function 
is of the form Y17j=i '^ijXjXi (c.f. [29l |30]), so basis functions of the form 
(Tij = XjXi will provide an accurate estimation of the optimal value func¬ 
tion provided the weights, Wij G M, are tuned properly. However, in most 
cases, the form of the optimal value function is unknown, and generic basis 
functions have been proposed to parameterize the problem. 

Adaptive dynamic programming (ADR) replaces V with a parametriza- 
tion, V{x,Wc) = ■^ith Wc = {wi^c, ■■■,wm,c) G and 

u* with a parametrization ■u(x,Wa) = —^R~^g{x)'^VxV{x,Wa)^ where 
Wa € The actor and critic weights, Wa and Wc respectively, are tuned 
to minimize the residual Bellman error (BE), 

5{x, Wc,, We) = X^Qx+u(x, WafRu(x, Wa)+VxV{x, We) (/(x) + ff(x)u(x, Wj) , 

over all x in some compact set D in real-time. The BE is used to motivate 
weight update laws for Wa and We to achieve a real-time minimization. 

Throughout the ADR literature, many basis functions have been pro¬ 
posed for real-time (approximate) optimal control. However, in practice, it 
is difficult to select weight update laws that guarantee stability by achiev¬ 
ing a good approximation of the ideal weights, especially for a system with 
a modest embedded processor. In the majority of cases, actual implemen¬ 
tation of ADR is executed using only polynomial basis functions, and the 
StaF method enables a broader class of functions to be used for approximate 
optimal control of a dynamical system. 

In this setting, the StaF problem becomes 

limsup sup |(5(x, Wa(t), Wc(t))| < e. 

X&Nr(x) 
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[Appendix A provides more information concerning the application of 
the StaF method to ADP by presenting the results of a companion paper 


9. Conclusion 


A new StaF kernel method is introduced in this paper for the purpose 
of function approximation. The development in this paper establishes that 
by using the StaF method a local approximation of a function can be main¬ 
tained in real-time as a state moves through a compact domain. Heuristi- 
cally, much fewer kernel functions are required in comparison to more tra¬ 
ditional function approximation schemes, since the approximation is main¬ 
tained in a smaller region. For the exponential kernels, a new theorem in 
this paper establishes that an explicit bound on the number of kernel func¬ 
tions required can calculated. Two applications of this methodology were 
presented. In Section a “gradient chase” algorithm was developed. There 
it was seen that a function may be well approximated provided that the 
algorithm was applied with a high enough frequency. Simulations results 
provided in Section demonstrated the performance of the gradient chase 
algorithm, and an application to ADP is provided in Section and the 
Appendix for an infinite horizon optimal regulation problem. 

The strength of the StaF methodology is the reduction of the computa¬ 
tional requirements for real-time implementation of a function approxima¬ 
tion, through the reduction in the number of basis functions. As demon¬ 
strated in Appendix A, where only three basis functions were required to 
achieve a stabilizing approximate optimal controller for a 2-dimensional sys¬ 
tem. However, since the StaF method aims at maintaining an accurate local 
approximation of the value function only in a local neighborhood of the cur¬ 
rent system state, the StaF kernel method lacks memory, in the sense that 
the information about the ideal weights over a region of interest is lost when 
the state leaves the region of interest. Thus, unlike existing techniques, the 
StaF method generates an approximation that is valid only in a local region. 
A memory-based modification to the StaF kernel method that retains and 
reuses past information for creating a global approximation is the subject 
of future research. 


Appendix A. Applications to Adaptive Dynamic Programming 

To demonstrate the effectiveness of the StaF technique in the context of 
optimal control, the simulation results of a companion paper are presented 
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State Trajectory 


Policy Weights 



(a) Trajectory of the state vector. 


Value Function Weights 



Value Function Estimation Error 



(e) Error of the estimation of the value func¬ 
tion at the current state. 



(b) Trajectory of the actor weights. 


Control Estimation Error 



Time (s) 

(d) Error committed by approximate policy. 


Figure A.2: Results of the numerical experiment demonstrating the conver¬ 
gence for the StaF ADP method. [22] 
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here. The details of the analysis are contained in [2^. The dynamical system 
in question is of the form x = /(x) + g{x)u where x = {xi, X 2 Y' G 


“ 5 X 2 (cos( 2 xi) + 2)^) ’ (^cos(2xi) + 2 

Associated with this dynamical system is the cost functional 

poo 

J{x,u)= / (x'^(r)x(r) + u(r)^) dr 

Jo 


(AT) 

(A.2) 


In the inhnite horizon regulation problem, the goal is to determine an opti¬ 
mal control law u* : —)> M (assuming an optimal control law exists) that 
satisfies 

poo 

u*(xo) = argmin / (x^(r)x(T) + u(t)^) dr 

new Jo 

where U is the collection of admissible controllers and x(0) = xq inside the 
integrand. The optimal value function is given by 

pOD 

I/(xo)=min / (x'^(r)x(r)-|-n(r)^) dr 

new do 

when such a minimum exists, and the optimal value function satisfies the 
HJB equation Q . IfV* satisfies the HJB equation and is also continuously 
differentiable, then it is the unique solution to ([^. Furthermore, u* can be 
determined from V* by m*(x) = {x)W* (x). 

In most cases, the optimal value function cannot be determined analyt¬ 
ically, and approximate solutions are used instead. However, for the system 
presented in the section, the optimal value function is known. In particu¬ 
lar, for the infinite horizon optimal regulator problem with dynamics given 


by (A.I) with cost functional (A.2), the optimal value function is given 


by V*{x) = \x\ + X 2 and the associated optimal control law is given by 
u*{x) = — (cos(2xi) -|- 2 )x 2. More details can be found in [Ti] . 

In this example, the inhnite horizon optimal regulator problem is solved 
in real-time. The function V* is approximated by a function of the form 


3 

V (x, Wc) = ^ Wc,i (exp(x^Ci(x)) - l) 


i=l 


where Wc G are weights to be adjusted in real-time, and Cj(x) = x-|-dj(x) 
where 


di{x) = 0.7 


/ x^x -|- 0.01 \ 
V i -k x'^x ) 




T 
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for i = 1,2, 3. The approximation of the optimal control law is given by 

u(x,Wa) = -^g^(x)V^V(x,Wa) 

where Wa € are weights to be adjusted in real-time. In the framework of 
ADP, the functions V* and u* are replaced by their approximations V and 
u, respectively, in the HJB equation, yielding a residual nonzero error, called 
the Bellman error (BE). The goal is to minimize the BE by adjustments of 
the weights, Wa and Wc- If the BE is identically zero after the adjustment 
of the weights, then the optimal value function and the approximation of 
the optimal value function coincide. Eor nonzero BE, the BE is used as a 
heuristic measure of the distance between V and V*, as well as the distance 
between u and u*. The weight update laws and subsequent convergence 
analysis can be found in [22j . 

The results of the numerical experiment are presented in Figure |X^ Fig¬ 
ure |A.2a| indicates that the state is regulated to the origin when using the 


ADP algorithm combined with the StaF methodology. Figure A.2b shows 
that the weight vector Wa converged as well. In typical StaF implementa¬ 
tions, the weights are not expected to converge. However, since the optimal 
control problem is a regulator problem, the state and the centers ultimately 
occupy a hxed neighborhood of the origin, and the weights converge to the 
ideal weights corresponding to a small neighborhood of the origin. 

When the weights converge, it is expected that Wa and Wc converge 
to the same values. The convergence is demonstrated by comparing Figure 
A.2b and Figure A.2c The approximate controller and the optimal con¬ 
troller converge as well, as shown in Figure A.2d, and the value function 


estimation error, given in Figure A.2e vanishes rapidly. 
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